Quantum decision making in automatic driving

The behavior intention estimation and interaction between Autonomous Vehicles (AV) and human traffic participants are the key problems in Automatic Driving System (ADS). When the classical decision theory studies implicitly assume that the behavior of human traffic participants is completely rational. However, according to the booming quantum decision theory in recent years and actual traffic cases, traffic behaviors and other human behaviors are often irrational and violate the assumptions of classical cognitive and decision theory. This paper explores the decision-making problem in the two-car game scene based on quantum decision theory and compares it with the current mainstream method of studying irrational behavior-Cumulative Prospect Theory (CPT) model. The comparative analysis proved that the Quantum Game Theory (QGT) model can explain the separation effect which the classical probability model can’t reveal, and it has more advantages than CPT model in dealing with game scene decision-making. When two cars interact with each other, the QGT model can consider the interests of both sides from the perspective of the other car. Compared with the classical probability model and CPT model, the QGT is more realistic in the behavior decision-making of ADS.

For a long time in the future, autonomous vehicles will inevitably share urban roads with human traffic participants 1 . In order to drive safely and efficiently in this complex traffic surrounding, autonomous driving vehicles need to correctly estimate the behavioral intention of human traffic participants and interact with human traffic participants naturally just like human driving vehicles 2,3 . The behavior of human traffic participants and their interactions are very random in the real world actually. Osamu proposed that such randomness is characterized by obvious uncertainty and irrationality 4 . The "long tail" problem of autonomous driving includes various fragmented scenarios, extreme situations and unpredictable human behavior. This is related to the unreasonable behavior intention and uncertainty 5 , which needs to be studied by correct and effective cognitive and decision theory.
The mainstream methods of behavior decision-making, the traditional machine learning methods based on classical probabilistic reasoning 6,7 and the deep learning methods based on data drive are common. Traditional machine learning methods generally assume that the evolution process of traffic participants has the characteristics of Markov Decision Processes (MDP), Hidden Markov Model (HMM), Dynamic Bayesian network (DBN) and other methods to infer intentions. And the most extensive Partially Observable Markov Decision Processes (POMDP) is used to obtain correct behavior decision. However, the existing results of human behavior decisionmaking theory show that human behavior is incompatible with the complete rational hypothesis in classical decision-making theory 8 . And the cognitive and decision-making theory based on classical probability cannot accurately describe human behavior and its interaction 9 , which are the main bottlenecks restricting the safe and efficient driving of autonomous driving in actual urban traffic scenes 10 . Data-driven deep learning methods must rely on massive big data sample training (accumulated actual driving miles need to reach tens of billions of miles) to deal with the "long tail" problem that may cause driving accidents 11,12 and the explosive progress of autonomous driving technology being hindered. Daily, low probability of occurrence or emergent, dangerous, and the edge (corner) scene of scarce samples are often related to irrational behavior and interaction 13,14 . And pure deep learning method based on data driven is difficult to effectively cope with the "long tail" problem 15 .
What is gratifying is that quantum theory originated in the field of microscopic physics has been extended in the past two decades and it made great progress in many non-physical and macro fields such as cognition, decision making, information, communication, computing, etc. It has not only formed an increasingly mature theoretical system, but also been increasingly widely applied 16 . In particular, the initial quantum inkling in the field of mobile robots, being most closely related to unmanned driving technology 16 , allows us to see the potential and possibility of applying quantum theory to solve the cognitive problems of autonomous driving. Quantum theory provides a new way to study the uncertain behavior (including irrational behavior) of human Quantum decision theory. Quantum mechanics is the greatest discovery in the last century, which greatly promotes the development of modern science and technology and becomes the theoretical pillar of emerging science and technology. Scholars in the field of cognition have found that the interaction between interference and entanglement in quantum mechanics and human cognition has many similar characteristics, which urges us to construct a mathematical expression method of quantum mechanics, introduce quantum probability into the cognitive field, try to use the unique characteristics of quantum mechanics to build a cognitive model, and explain the problems in the field of human cognition that can not be explained by the cognitive decision theory based on classical probability,quantum cognitive decision theory based on quantum probability is gradually born 32 . Quantum logic was put forward by the famous mathematician Von Neumann, who defined the event as a subspace in Hilbert space 32 , so that quantum probability does not need to be constrained by many Boolean logic rules such as the rule of total probability. Therefore, quantum decision theory can allow those events that violate the law of total probability to exist. Busemeyer and Bruza pointed out that quantum logic is actually a kind of generalized Boolean logic, which does not have many constraints in Boolean logic, has greater flexibility and randomness, and is more conducive to explaining people's judgments and decisions 16 .
In recent 10 years, quantum cognitive decision theory has made a series of breakthroughs in the field of human cognition, which has been recognized as a new way to explore human cognitive science 33 16 , Quantum Game Theory(QGT) 16 , etc.] produced by combining quantum probability with classical machine learning theories (MDP, POMDP, DBN, HMM, etc.) provides a more advanced, effective and feasible theoretical tool for the cognitive decision research of autonomous driving system. Song and others for the first time quantum cognitive decision making theory is introduced into Autonomous driving field, aimed at the pedestrian crossing in Autonomous driving scene to build the kind of QLBN model, and compared with classical bayesian model made, in the case of the pedestrian crossing well explain the existence of irrational behavior, at the same time to build the kind of Quantum social force model and compared with the mainstream data-driven model, it also has a good advantage in pedestrian crossing trajectory prediction 37 , Catarina used QLBN to analyze "prisoner's dilemma" cases and obtain predictive results for similar events. Comparing with reality, Catarina proved the predictability of the quantum probability method 38 .
To sum up, there is no systematic method for automatic driving decision-making considering irrational behaviors of human traffic participants and their interactions. Although quantum decision-making theory has made great progress in recent years, it provides a new method to study the automatic driving decision-making problem considering the interaction of human traffic participants' behaviors (including irrational behaviors), but there are no research cases applied in the field of automatic driving at present. In this paper, the two-car game case is analyzed by QGT, which is the first attempt to apply quantum decision theory in the field of ADS.

Method
Classical probability and quantum probability. Let's assume that a system has attribute A, and its value can be up and down. In addition, the system also has attribute B, and its value can be left and right. The biggest difference between quantum probability and classical probability is that there are incompatible attribute pairs, that is, two attributes cannot be measured at the same time. Correspondingly, if two attributes can be measured at the same time, they constitute a compatible attribute pair. For the measurement of an attribute, quantum probability and classical probability will get exactly the same result. Furthermore, for compatible attribute pairs, there is still no difference between quantum probability and classical probability. In other words, the compatible attribute operation in quantum probability has been able to cover all the contents of classical probability theory. However, for incompatible attribute pairs, many classical probability algorithms are no longer valid. The properties of classical probability system can be found in the measurement of compatible attributes of quantum probability, but conversely, the incompatible attributes in quantum probability have special properties, so it can be said that quantum probability contains more probability operation systems than classical probability.
The advantages of quantum probability methods in decision-making will be demonstrated below by comparing Bayesian Network (BN) and Quantum-like Bayesian Network (QLBN) .

Bayesian network (BN).
A Bayesian Network (BN) is a directed acyclic graph in which each node represents a random variable, and each edge represents a direct influence from the source node to the target node. The graph represents independence relationships between variables, and each node is associated with a conditional probability table that specifies a distribution over the values of a node given each possible joint assignment of values of its parents.
Bayesian networks can represent essentially any full joint probability distribution, which can be computed using the chain rule for Bayesian networks. Let G be a BN graph over the variables X 1 , X 2 , ...X N . We say that a probability distribution, Pr, over the same space factorises according to G, if Pr can be expressed as the product 39 .
In Eq. (1), Pa X i , corresponds to the all the parent variables of X i . The graph structure of the network, together with the associated factorisation of the joint distribution allows the probability distribution to be used effectively for inference (i.e. answering queries using the distribution as our model of the world). For some query Y and some observed variable e, the exact inference in Bayesian networks is given by Each instantiation of the expression P r (Y = y, e) can be computed by summing out all entries in the joint that correspond to assignments consistent with y and the evidence variable e. The random variable W corresponds to variables are neither query nor evidence. The α parameter corresponds to the normalisation factor for the distribution P r (Y , e) . This normalisation factor comes from some assumptions that are made in Bayes rule 40 .
Quantum-like Bayesian network (QLBN). QLBN can be defined by a pair G, P g where G is a directed acyclic graph represented by a pair G = (V , E) . Each vertex v i ∈ V represents a random variable, and the random variable is a quantum state in the complex Hilbert space H G , but e j ∈ E is a set of directed edges that represent the relationship between vertices. P g is a density operator in the compound state of independent complex Hilbert Spaces with different dimensions, and P g is defined on a fixed basis, so it satisfies the same conditional independence constraint as BN, except that the actual probability value is replaced by the complex probability amplitude 40 .
Quantum probabilities are computed using projective rules that involve three steps. First, the probabilities for all events are determined from a state vector |z � ∈ H of unit length (i.e.,||z �| 2 = 1). This state vector depends on the preparation and context (person, stimulus, experimental condition). More is said about this state vector www.nature.com/scientificreports/ later, but for the time being, assume it is known. Second, to each event there is a corresponding projection operator P x that projects each state vector |z � ∈ H onto event. Finally, probability of an event is equal to the squared length of this projection: Projection operators are characterized as being Hermitian and idempotent. To say P is Hermitian means that P = P † ; in matrix terms, for every i and j, the entry P i,j in row i, column j of P and the entry P j,i in row j, column i of P are complex conjugates of each other. To say P is idempotent means P 2 = P . Figure 1 illustrates the idea of projective probability. In Fig. 1, the squared length of the projection of |z � onto the event is the probability of the event given the state |z �.
In 40 , Jerome R. Busemeyer carried out a detailed derivation of the quantum probability distribution of single variable and multiple variables, which will not be repeated in this paper. QLBN is described in more detail below.
Let H G be a complex Hilbert space representing a QLBN, and let H 1 ∈ H G , H 2 ∈ H G , . . . , H n ∈ H G be a collection of different Hilbert Spaces that make up QLBN. The network H G of these Hilbert Spaces is defined as the tensor product of each Hilbert space: H G = H 1 ⊗ H 2 ⊗ · · · ⊗ H n . The dimension of H G corresponds to the size of the full joint probability distribution. The random variables that constitute the network are represented by quantum states. This means that the random variables are represented as complex probability amplitudes rather than real numbers in BN. In QLBN, two types of quantum states need to be distinguished: (1) the state corresponding to the root node and (2) the state corresponding to the child node. The root node corresponds to the quantum pure state. It can be described by the following formula: And the |0� and |1� is called the basis state and corresponds to the basis: [1, 0] T and [0, 1] T . The variable α 0 and α 1 corresponds to the complex probability amplitude of the form: √ re iθ , r ∈ R . On the other hand, the child nodes represent the statistical distribution of different quantum states. This indicates that the child node is represented as a set, in which the conditional probabilities are different quantum states in the set 41 . It can be described in Figs. 2 and 3. Figure 2 shows the representation of the pure state (root node), and Fig. 3 shows the representation of the set of states (child nodes). Among them, since quantum probability amplitudes are represented in complex numbers, a quantum state can be represented geometrically in different ways depending on the phase of the complex amplitudes θ.
In Quantum theory, all independent quantum states contained in a Hilbert space are defined by a superposition state, represented by the quantum state vector |S� , which contains the occurrence of all the events of the system. This can be analogous to the full joint probability distribution of classical probability, except that the www.nature.com/scientificreports/ probability is expressed by the complex probability amplitude instead of the real numbers. In this sense, superposition state |S g � contains all possible events in Hilbert space H G , given by: where k 1 , k 2 , . . . , k n corresponds to the basis of each quantum state in the network. The purpose of the density operator P g is to describe a system in which we can calculate the probability of finding each state in the network. The implementation method is to calculate the density operator through the cross product of superposition states |S g � 41 : The density operator P g corresponds to a n × n Hermitian matrix, where n is the number of quantum states in the network, which contains the full joint probability distribution of classical probability if we sum the elements of the main diagonal.
Density operators also contain quantum interference terms in non-diagonal elements, which are at the heart of the model. It is through these non-diagonal elements that one can obtain the quantum interference effect during inference, thus deviating from completely rational probabilistic reasoning. It can be seen that quantum states in QLBN allow different levels of deterministic representation, which can be concreted in Fig. 4: |ψ 1 � is a perfectly rational and optimal decision (completely classic), to follow the expected utility axiom (closely related to the rational choice theory in economics) 42 , |ψ 2 � and |ψ 3 � for the prediction of sub-optimal decisions deviating from the expected utility theory, but still provide the satisfaction of the utility (associated with bounded rationality theory), |ψ 4 � for irrational decision 43 (Quantum) completely, reflects decision choices that lead to less efficient use (associated with contradictory decisions and cognitive biases).
In the process of inference, subgroups of the quantum system need to be traced from the large system represented by the density operator P g , and partial tracking algorithm is used according to 41 : At the same time, the calculated complex probability amplitude is converted to the actual probability value. Given a certain evidence variable e, the quantum edge probability of the discrete random variable is obtained, and the scores obtained are normalized: According to the expansion of the above formula, the quantum marginalization formula (10) is obtained, which is composed of two parts: the first part represents the classical probability, and the second part represents the quantum interference term, which is expressed by Formula (11): www.nature.com/scientificreports/ In the above formula, if θ i − θ j = π/2 , then cos(θ i − θ j ) = 0 , it means that the quantum interference term is canceled and the QLBN collapses into a classical BN. In other words, we can think of QLBN as a more general and abstract model of classical networks because it represents both classical and quantum behavior.
For normalization purposes, we assume that the decision maker is subjected to the same quantum interference term, i.e. (θ i − θ j ) = θ . If cosθ = 1 , then θ = 0 + 2kπ, k ∈ Z , this is equivalent to the maximum phaselong interference that can be achieved by quantum probabilistic inference. Similarly, at that time cosθ = −1 , θ = π + 2kπ , k ∈ Z , the minimal destructive interference is achieved, at that time θ ∈ [0, π] , the probability inference calculated by using quantum probability theory can have different ranges of all possible probability values. Therefore, the size of the value θ represents the uncertainty in the decision-making process.
To sum up, quantum probability has wider physical meaning and properties than classical probability. Measurement is an important way to transform the illusory world of quantum into the real world, and human consciousness itself is transforming various possibilities into reality. This makes many scientists and philosophers think that quantum probability can not only describe the microscopic particle world, but also describe human consciousness and cognitive behavior 39 .

Comparison of the three models (classical game model, CPT model and QGT model).
In the context of related work in Part II, quantum decision model is the method of this paper to try to solve the irrational behavior and interaction of human traffic participants in autonomous driving. The following paper will compare the three models, so as to further illustrate the necessity of adopting quantum decision method in this paper.
Game model based on Markov (classical game model). Assuming that two decision makers T and I play a game, the strategies that can be adopted are p and y. The income matrix is constructed as follows ( Table 1): All the situations faced by decision makers are defined in a space, which includes four ground States that decision makers may encounter 16 : I p T p , I p T y , I y T p and I y T y , where I i T j is that decision maker T made the decision of action j after decision maker I taking action i, subscript 'p' indicates taking strategy p, and 'y' indicates taking strategy y. These four ground States constitute a four-dimensional dynamic Markov model, which is independent  This column dimension shows the probability of all possible situations, ψ ij shows the probability of decider T in I i T j and i j ψ ij = 1 . The following assumptions are made: when decision maker T doesn't know I's behavior, the probability of each ground state is the same, that is, 0.25. When decision maker T knows I's behavior, the corresponding ground States are equally distributed. Then the initial behavior state vector of decision maker T is: where, ψ 0 (0) , ψ 1 (0) , ψ 2 (0) , respectively, represents the initial behavior state vector of decision maker T who does not know I's behavior, decision maker T who knows I's intention to adopt strategy p and decision maker T who knows I's intention to adopt strategy y, after time t, the initial vector will become the final vector ψ 0 (t) , ψ 1 (t) , ψ 2 (t) , it also represents the completion of the decision. This dynamic process can be described by the solution of Kolmogorov forward equation: where, K A is the strength matrix, which is the key to the solution of the equation and is related to the income matrix under different conditions. Finally, the strength matrix is described as: where, u i are utility functions related to the difference between the benefits of decision makers under different decision-making conditions. In classical Markov dynamic decision-making, the value is limited to positive real numbers 44 . For example, the meaning of u p is expressed as equation: where, X pp and X py respectively represent the profit value of decision maker T adopting strategy p and y, after decision maker T knows that I adopts strategy p, in Table 2, X pp = 10 and X py = 5, so u p = u y = u(5).
When decision maker T executes the classic Markov dynamic decision, all the current situations are taken into consideration. For example, to calculate the probability of decision maker T taking strategy p is to add the elements of the first and third lines of ψ(t) , and the same applies to other cases, namely: Cumulative prospects theoretical (CPT) model. Let {a} = {a 1 , a 2 . . . , a n } is a set of n possible actions,for each action a i , the possible state set is defined as {x i } = {x i,1 , x i,2 . . . , x i,m } , where x i,j ∈ R , and i = 1, . . . , n, j = 1, . . . , m. The probability of each state is expressed as p i,j = p(x i,j ). and satisfies that m j p i,j = 1 , definition u(x i,j , a i ) is the utility function of each pair of actions-state, then under each decision a i , the possible prospect can be expressed T is the utility vector defined on the possible state set,p i = [p x i,1 , p x i,2 , . . . , p x i,m ] T is the probability vector corresponding to {x i } , and the expected utility U of each decision can be written as U(a i ) = U(P i ) = m j=1 u x i,j , a i p x i,j . Cumulative Prospect Theory (CPT), proposed by Kahneman and Tversky, expounds many biased or irrational human behaviors in a unified way. Compared with the traditional Expected Utility Theory (EUT), CPT introduces two additional concepts in the definition of prospect: (1) P: value function V defined in utility, (2) decision weight function π defined in cumulative probability (as shown in Fig. 5). Each action is evaluated by the following equation: www.nature.com/scientificreports/ where, the function V is a strictly increasing function, and u + and u − are the gains and losses of u compared with the reference utility u 0 . Decision weight is defined as: where, w ± is a strictly increasing function, in general, V (u) is convex for u ≥ u 0 (gain), when u ≤ u 0 (loss), V (u) is concave and the loss is steeper than the gain. Figure 5a shows an example of a value function when u 0 = 0 is set as a reference utility. Many experimental studies have shown that the representative function form of V and w can be written where, α, β, γ , δ ∈ (0, 1] , and ≥ 1 , in Fig. 5b, this decision weight function can well describe the observed behavior that humans tend to overestimate the occurrence of low probability events and underestimate the occurrence of high probability events. CPT model assumes that the decision-maker chooses the behavior that produces the maximum value defined in (18), that is, Quantum game theoretical (QGT) model. In the QGT model, decision maker T is in a superposition state before observing I, after observation, the superposition state is transformed into a possible ground state, the probability of its transformation is the square of the magnitude of the probability amplitude ( ψ ij ) and it satisfies the uniformity, as shown in Eq. (22) : In the QGT model, the initial behavior state vector of the decision maker is In the same way as the classical Markov dynamic model, after time t, the initial vector will evolve into the final vectors ψ 0 (t), ψ 1 (t), ψ 2 (t) , which also represent the completion of the decision. This dynamic process can be described by the solution of Schrodinger equation: www.nature.com/scientificreports/ where, H A is a Hamiltonian matrix, which is the key to the solution of the equation,It is similar to the construction of the strength matrix, and finally the Hamiltonian matrix is described as: Different from the classical Markov decision model, utility functions u i range from −1 to 1 . At the same time, in the decision model of QGT, we should consider the influence of irrational behavior, which Busemeyer called "cognitive disorder" 34 . The relationship between belief and behavior can be expressed by the following matrix: The irrational behavior matrix is added to the decision model of QGT, and a new Hamiltonian matrix (H A + H B ) is constructed and brought into the solution of Schrodinger equation.
However, the above-mentioned quantum game model method has some problems: when decision maker T knows the intent of I, it can find the solutions of ψ 1 (t) and ψ 2 (t) , When I's intention is unknown, the solution of ψ 0 (t) cannot be obtained by the above method, the reasons are as follows: (1) The interaction is not fully considered, and decision-maker T only considers it from its own profit dimension, thus ignoring it. In fact, in the case of uncertainty about I's intention, I's benefits should be considered. (2) The utility function is related to the profit value of both parties, so the profit of I need not be considered when determining the intention of I, but it needs to be considered when the intention of I is uncertain, in the past, the utility function obtained u p = u y = u(5) according to Eq. (16), but in fact, u p is not necessarily equal to u y , so the utility function needs to be processed in the next improvement process.
When the decision maker T doesn't know I's intention, the QGT model (Eqs. 24 and 25) built before is improved, and the benefits of decision maker T and I are considered, make the following improvements: reconstruct the Hamiltonian matrix, and add the benefits of I when constructing, and get: where, H 00 is a newly constructed Hamiltonian matrix; H 01 is the profit of decision maker T; H 02 is the profit of I; H p1 /H y1 is Hamiltonian matrix when I adopts strategy p/y; u p1 /u y1 is when I takes strategy p/y, the difference between the profit of I earned by decision maker T adopting strategy p and the profit of I earned by decision maker T adopting strategy y.
The newly constructed matrix above solves the first problem, and then, it needs to solve the problem of oversimplification of utility function, that is, how to define a utility function that can effectively reflect the difference between the benefits of decision makers. We choose to use the value function in expectation theory, such as Eq. (28), and make proper normalization to meet the requirements of QGT model for the value range of utility function.
where, D p represents when I adopts strategy p, the difference in value between the decision maker T who chooses to take strategy p, profit is 10, and who chooses to take strategy y, profit is 5, D p is normalized with a hyperbolic tangent function similar to logistic regression: D p = 10 a − 5 a ; D y is when I adopts strategy y, the difference in value between the decision maker T who chooses to take strategy p, profit is 25, and who chooses to take strategy y, profit is 20, in a similar way, D y = 25 a − 20 a , power a is the risk aversion index of decision maker T in its own income dimension, with the value between 0 and 1. D p1 is when I adopts strategy p, the difference in value between the decision maker T who chooses to take strategy p, I gets the profit is 10, and who chooses to take strategy y, I gets the profit is 25: D p1 = 10 b − 25 b ; D y1 is when I adopts strategy y, the difference in value between the decision maker T who chooses to take strategy p, I gets the profit is 5, and who chooses to take strategy y, B gets the profit is 20, D y1 = 5 b − 20 b , power b is the risk aversion index of decision maker T in the income dimension of I. In the game in real time, in order to ensure the benefits of decision maker T, the benefits of decision maker T are greater than those of decision maker I,so the choice between a and b is 0 < b < a < 1 . In this paper, b = a/4 , because the separation effect is most obvious at this time.

Case study
In this paper, two car game scene as shown in Fig. 6, white car for target vehicle, red car for interact vehicle. The two cars were driving side by side at the intersection of the viaduct and the side road, target vehicle need to turn left to the viaduct, interact vehicle need to turn right to the side road, the game scenario is created. From the first perspective of target vehicle, there are three situations: seeing the interact vehicle trying to speed up and pass, seeing the interact vehicle slowing down and yield, and not being sure of the interact vehicle's intent. What kind of decision does the target vehicle make in three situations? According to the classic Markov dynamic decision model (Eq. 14), when the opponent's vehicle intention is uncertain, the probability of the target vehicle adopting accelerated overtaking is the same as the probability of the target vehicle when the opponent wants to accelerate or when the opponent wants to decelerate, which cannot explain the separation effect and obviously does not conform to the actual situation, so the process is omitted.
Simulation analysis based on CPT model. There are two actions for defining vehicles: pass and yield, that is, {a} = {a p , a y },under the a p decision, the target vehicle needs to test the possibility that the interact vehicle will not yield, which can force the target vehicle to brake without passing by law. However, for a y decision, we can assume that it is always successful. Therefore, the prospects of a p and a y are: where, {ξ t I , ξ t T } is the historical track set of vehicles, I indicates the interact vehicle, T indicates the target vehicle, p I,y indicates the yield probability of interact vehicle, ξ I,y and ξ I,ny is divided into yield track and non-yield track of interact vehicle. ξ T,p and ξ T,y divide into the pass track and yield track of target vehicle. Make u 0 = 0 . Looking back at the CPT models defined in (18)- (21), the CPT values of target vehicle under different decisions can be written as: Then write the decision of target vehicle as The method in reference 45 obtains the parameters in CPT by Inverse Reinforcement Learning (IRL), assuming that u is a linear combination of some characteristics, including speed, acceleration, emergency braking and safety. The learned decision weighting function is shown in Fig. 7. CPT model does capture people's choice pattern, that is, low probability events are often overestimated, while high probability events are often underestimated. This result is consistent with the research on human behavior in other fields such as economics, investment and waiting paradox.
From the simulation results (Fig. 7), the CPT model can solve the irrational problem in automatic driving decision-making, but this paper would like to put forward three different views: (1) The probability value calculated by the CPT is based on the classical probability, and does not take into account the situation when the superposition state is generated,for example, when setting the action set, there are only two actions: pass and yeild, but in actual scenes, many interactive vehicles will be hesitant. (2) When the cumulative prospect theory (29) P a p = u ξ I,y ,ξ T,p , p I,y , u ξ 1,ny ,ξ T,p , 1 − p I,y P a y = u ξ I,ny ,ξ T,y , 1.0 (31) a * = arg max a∈{a p ,a y } V a p , V a y www.nature.com/scientificreports/ uses IRL to learn parameters, it is assumed that the interact vehicle will not yield when the target vehicle wants to pass, and the original initial speed of the interact vehicle will remain unchanged, this assumption is not in line with the actual situation,if the interact vehicle does not yield, the original speed should be increased to prompt the target vehicle, so the result obtained at this time will be different from the actual situation. (3) CPT assumes that when the interact vehicle yields, the target vehicle will pass 100%, this assumption is completely rational and does not conform to the actual situation.
Simulation analysis based on QGT model. Compared      www.nature.com/scientificreports/ When the intention of the interact vehicle 's vehicle is uncertain, the probability of pass/yeild the vehicle of the target vehicle is simulated by improved QGT, and the benefits of both parties are considered from the perspective of the opponent's vehicle, the parameter variables are quantum entanglement factor γ and risk aversion index a , and the results are shown in Fig. 10.
In the improved model, when quantum entanglement is 0, the probability of pass or yeild the target vehicle collapses to the classical model, which is 0.5, and then fluctuates with the increase of quantum entanglement and benefit function, in the case of maximum quantum entanglement, the improved QGT model is more inclined to pass (probability value is 0.75), the irrational behavior and the interaction between the two vehicles are considered in the QGT model, which is more suitable for the actual situation than the CPT model.

Experimental analysis
In order to verify the effectiveness of CPT and QGT, the experimental data set scene is similar to the simulation model, in the real scene, vehicles on the two main roads are allowed to change lanes in the red box, so this is the center of game decision-making. We trained and tested two models on a data set (Fig. 11) containing 348 pairs of interacting trajectories with a sampling frequency of 10 Hz. To learn more generalized results, we slice the trajectories into frames with a fixed length using moving windows. Each frame contains the trajectories in 1 s. Thus, all 348 pairs of interacting trajectories generate 13,920 frames.
The verification results of the two models are shown in the following table (Table 2). The results show that the decision-making accuracy of the QGT model is higher than CPT model. The main reasons are as follows: (1) The probability value calculated by CPT model is based on the classical probability calculation, without considering the situation of superposition state. For example, when setting the action set, there are only two actions of speeding through and slowing down, but there is no state between them, therefore, this method does not fundamentally solve the irrational decision-making problem; (2) In reference 45 , inverse reinforcement learning (IRL) for parameters in CPT did not take into account the influence of interaction between the two sides of the interaction, resulting in a relatively low success rate; (3) Compared with CPT, QGT takes into account the superposition state in the action set, which is more consistent with the actual situation without completely rational assumption.
However, we also need to demonstrate the benefits of QGT in terms of data efficiency. Neural network model achieves good results in specific scenarios through a large number of data-driven methods 45 , while QGT does not require a large number of data-driven methods. If our results are similar to or even more advantageous than the neural network model, the QGT proposed by us will have more theoretical value.
The neural network algorithm is applied to the data set (Fig. 11). To achieve better performance for the learning-based model, we have conducted two sets of experiments for the training of the neural network: Figure 10. Probability of the target vehicle pass (left:Pr1)/yeild (right:Pr2) when the intention of interact vehicle is uncertain after optimization (t = π/2). Figure 11. In the interaction scenes collected from the real road data set, the area where the game takes place is marked with the red box. The large discrepancy between the testing accuracies of the two experiments with the NN model is mainly due to the over-fitting problem cause by the data insufficiency. In experiment 1, it showed that the NN model learned on 80% of the trajectory pairs cannot be well generalized to other interaction pairs. Table 3 shows the results of the comparison. The results show that: QGT model compared to neural network model, the result is close to and the QGT model does not need to be driven by a large amount of data, which makes the QGT model more efficient than the neural network model.

Conclusion and prospect
In this study, the QGT model was used to analyze the interaction between two vehicles, compared with the classical Markov dynamic decision model. This model is more practical and successfully explains the separation effect. Compared with the CPT model, the QGT model takes the superposition state in the action set into consideration and abandons the assumption of complete rationality, which is more consistent with the actual situation. When the opponent's vehicle intention is known, only the uncertainty and irrational behavior in the environment are considered, and the probability value fitting the actual situation is obtained. Otherwise, the profit value of both sides of the game will be also considered from the perspective of the opponent's vehicle, and the actual result is obtained. According to the case analysis, the QGT model owns more advantages than the classical Markov dynamic decision model and CPT model in explaining the uncertainty and irrational behavior and interaction of other traffic participants. The CPT model and QGT model are further verified by data sets, showing that the QGT model has more advantages in dealing with game scene decisions. At the same time, compared to neural network model, the result of QGT model is close to and the QGT model does not need to be driven by a large amount of data, which makes the QGT model more efficient than the neural network model.
In the following work, more complex traffic scenes (such as real road data sets) are cited, and their interaction with other traffic participants in automatic driving is further explored by combining quantum theory with deep learning.
Although the application research is only carried out at the simple two-car game scenario, the research method adopted is also instructive and referential for more complex scenes in automatic driving. This paper firstly attempt to apply QGT to automatic driving, providing a new reference frame for the study of decisionmaking problem of bounded rational behavior interaction of human traffic participants. We believe that with the further development of quantum decision theory and the continuous exploration of researchers, its application in autonomous driving will be more popular and in-depth. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.